Skip to content

Refactor26#326

Draft
bnlawrence wants to merge 19 commits intomainfrom
refactor26
Draft

Refactor26#326
bnlawrence wants to merge 19 commits intomainfrom
refactor26

Conversation

@bnlawrence
Copy link
Copy Markdown
Collaborator

@bnlawrence bnlawrence commented Apr 9, 2026

Description

This is a massive refactor of pyactivestorage to establish a complete separation of concerns between storage format, active itself, and backend reductions. The objective is shown in a set of UML diagrams, but in particular these two:

and

If agreed this is the right approach, we wouldn't merge this into main, we'd replace main with it!

Key review objectives should be:

  1. Does the testing strategy here cover all the testing that we had in main? (It should, the main testing was pulled in test by test, rather than by merge).

  2. Is the new structure more logical and likely easier to maintain, and in particular to extend?

  3. IS there any signficant performance degradation associated with it.

Note that documentation has not been updated. We can do that when we are happy with the broad structure.

Bryan Lawrence added 12 commits April 5, 2026 09:42
…p.cfg, README.md, .gitignore, environment.yml; fix s3_exploratory import paths
… updated backends and tests

- Backport activestorage/reductionist.py with v2 API (axis parameter, interface_type, option_disable_chunk_cache)
- Update S3Backend and HttpsBackend to use new reductionist.reduce_chunk() signature
- Change API endpoint from v1 to v2
- Backport test_reductionist.py and test_reductionist_json.py from main
- Update test_storage_types.py mock to match new signature
- Fix relative imports in test_reductionist_json.py

This enables:
- Reductionist-based backends (S3, HTTPS) to support axis-aware reductions via v2 API
- LocalBackend continues to use storage.reduce_chunk() without axis (unchanged)
- Hybrid architecture: axis support at backend level without modifying storage.py contract
- Fix S3Backend.reduce_chunk(): pass request.axis instead of hardcoded axis=None
- Fix HttpsBackend.reduce_chunk(): pass request.axis instead of hardcoded axis=None
- Add sum() method to Active class with axis parameter support
- Update test_storage_types.py mock assertion to expect correct axis=(0,1,2)

This completes the axis parameter chain:
Active.method(axis=N) → _get_selection() → _from_storage() → strategy → ChunkRequest
→ backend.reduce_chunk() → reductionist.reduce_chunk(axis=...)

All 68 tests pass with axis support fully integrated.
Replace all 'from activestorage.config import *' with explicit symbol imports:
- tests/test_bigger_data.py: USE_S3
- tests/test_missing.py: USE_S3
- tests/test_byte_order.py: USE_S3
- tests/test_compression.py: USE_S3, S3_ACCESS_KEY, S3_SECRET_KEY, S3_URL, S3_ACTIVE_STORAGE_URL
- tests/test_harness.py: USE_S3
- tests/test_reductionist_json.py: USE_S3
- tests/test_compression_remote_reductionist.py: USE_S3, S3_BUCKET, REMOTE_RED, S3_ACTIVE_STORAGE_URL
- tests/unit/test_active.py: USE_S3
- tests/unit/test_storage_types.py: S3_ACTIVE_STORAGE_URL, S3_URL
- tests/utils.py: USE_S3, S3_URL, S3_ACCESS_KEY, S3_SECRET_KEY, S3_BUCKET
- bnl/bnl_test.py: removed unused import
- bnl/test_missing_gubbins.py: removed unused import

Benefits:
- Pylance type checking now works correctly
- Code is clearer about dependencies
- Reduces IDE warnings about undefined names
- All 68 tests still pass
The InvalidHDF5File error is now raised during Active.__init__() when the
file is opened (pyfive.File), not later during data access. Update the
non-S3 case to wrap the initialization in pytest.raises context, making
it consistent with the S3 case behavior.

This fixes the test failure after explicit import changes.
- Update active_refactor_v7.puml: Add sum() method to Active class
- Add axis_flow_v1.puml: Sequence diagram showing axis parameter flow
  through the entire reduction pipeline from User → Active → Strategy →
  Backend → Reductionist v2 API
- Add axis_support_architecture_v1.puml: Detailed architecture diagram
  showing how axis is handled across core classes, backends, and the
  reductionist API with practical examples

The updated documentation clearly shows:
- How axis parameter is stored and passed through Active class
- Complete data flow from user call to backend reduction
- Differences between LocalBackend (no axis) and S3/HTTPS backends
- How Reductionist v2 API performs axis-aware reduction
- Example usage patterns for different axis configurations
@bnlawrence bnlawrence requested a review from valeriupredoi April 9, 2026 11:25
Bryan Lawrence added 2 commits April 9, 2026 12:35
@valeriupredoi
Copy link
Copy Markdown
Collaborator

a few more things to tell Mr Claude to take care of @bnlawrence - fairly important 😁

  • recognize storage_type (which in current main is interface_type), and...
  • once it knows it as eg S3 not complain bnl/file.nc can not be found hehe

Bryan Lawrence and others added 5 commits April 9, 2026 14:51
- Restored test_active_axis.py, test_real_https, test_real_s3, test_real_s3_with_axes from origin/main
- Added load_from_https to helpers.py and re-exported from active.py
- Added axis parameter to Active.__init__() for backward compatibility with axis-on-construction tests
- Fixed storage.reduce_chunk() and reduce_opens3_chunk() to preserve array structure for axis-specific reductions
- Added _apply_missing_mask() to keep masked arrays without flattening (replaces remove_missing for axis paths)
- Updated LocalBackend and S3Backend v1 to pass axis parameter to reduce functions
- Added axis validation in Active._get_selection() with proper out-of-range error handling
- Fixed encode_result() to properly serialize numpy arrays/scalars to JSON-compatible types for CBOR

This ensures consistency between local, S3 v1, and remote Reductionist axis-based reduction paths.
@valeriupredoi
Copy link
Copy Markdown
Collaborator

installed from PyPI: Successfully installed ActiveStorage-0.0.1.dev841+g28ede408d cloudpickle-3.1.2 dask-2026.3.0 donfig-0.8.1.post1 google-crc32c-1.8.0 kerchunk-0.2.10 locket-1.0.0 partd-1.4.2 toolz-1.1.0 zarr-3.1.6 - ideally none of these should be installed from PyPI (apart from active storage of course)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants